78 research outputs found

    A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

    Full text link
    The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confounders while avoiding model misspecification remains only partially addressed. Efficient estimators constructed from data adaptive estimates of the data-generating distribution provide an avenue for avoiding model misspecification; however, in the context of high-dimensional problems requiring simultaneous estimation of numerous parameters, standard variance estimators have proven unstable, resulting in unreliable Type-I error control under standard multiple testing corrections. We present the formulation of a general approach for applying empirical Bayes shrinkage approaches to asymptotically linear estimators of parameters defined in the nonparametric model. The proposal applies existing shrinkage estimators to the estimated variance of the influence function, allowing for increased inferential stability in high-dimensional settings. A methodology for nonparametric variable importance analysis for use with high-dimensional biological datasets with modest sample sizes is introduced and the proposed technique is demonstrated to be robust in small samples even when relying on data adaptive estimators that eschew parametric forms. Use of the proposed variance moderation strategy in constructing stabilized variable importance measures of biomarkers is demonstrated by application to an observational study of occupational exposure. The result is a data adaptive approach for robustly uncovering stable associations in high-dimensional data with limited sample sizes

    Revisiting the propensity score's central role: Towards bridging balance and efficiency in the era of causal machine learning

    Full text link
    About forty years ago, in a now--seminal contribution, Rosenbaum & Rubin (1983) introduced a critical characterization of the propensity score as a central quantity for drawing causal inferences in observational study settings. In the decades since, much progress has been made across several research fronts in causal inference, notably including the re-weighting and matching paradigms. Focusing on the former and specifically on its intersection with machine learning and semiparametric efficiency theory, we re-examine the role of the propensity score in modern methodological developments. As Rosenbaum & Rubin (1983)'s contribution spurred a focus on the balancing property of the propensity score, we re-examine the degree to which and how this property plays a role in the development of asymptotically efficient estimators of causal effects; moreover, we discuss a connection between the balancing property and efficient estimation in the form of score equations and propose a score test for evaluating whether an estimator achieves balance.Comment: Accepted for publication in a forthcoming special issue of Observational Studie

    A nonparametric framework for treatment effect modifier discovery in high dimensions

    Full text link
    Heterogeneous treatment effects are driven by treatment effect modifiers, pre-treatment covariates that modify the effect of a treatment on an outcome. Current approaches for uncovering these variables are limited to low-dimensional data, data with weakly correlated covariates, or data generated according to parametric processes. We resolve these issues by developing a framework for defining model-agnostic treatment effect modifier variable importance parameters applicable to high-dimensional data with arbitrary correlation structure, deriving one-step, estimating equation and targeted maximum likelihood estimators of these parameters, and establishing these estimators' asymptotic properties. This framework is showcased by defining variable importance parameters for data-generating processes with continuous, binary, and time-to-event outcomes with binary treatments, and deriving accompanying multiply-robust and asymptotically linear estimators. Simulation experiments demonstrate that these estimators' asymptotic guarantees are approximately achieved in realistic sample sizes for observational and randomized studies alike. This framework is applied to gene expression data collected for a clinical trial assessing the effect of a monoclonal antibody therapy on disease-free survival in breast cancer patients. Genes predicted to have the greatest potential for treatment effect modification have previously been linked to breast cancer. An open-source R package implementing this methodology, unihtee, is made available on GitHub at https://github.com/insightsengineering/unihtee

    Disability-adjusted life-years (DALYs) for 315 diseases and injuries and healthy life expectancy (HALE) in Iran and its neighboring countries, 1990–2015

    Get PDF
    BACKGROUND: Summary measures of health are essential in making estimates of health status that are comparable across time and place. They can be used for assessing the performance of health systems, informing effective policy making, and monitoring the progress of nations toward achievement of sustainable development goals. The Global Burden of Diseases, Injuries, and Risk Factors Study 2015 (GBD 2015) provides disability-adjusted life-years (DALYs) and healthy life expectancy (HALE) as main summary measures of health. We assessed the trends of health status in Iran and 15 neighboring countries using these summary measures. METHODS: We used the results of GBD 2015 to present the levels and trends of DALYs, life expectancy (LE), and HALE in Iran and its 15 neighboring countries from 1990 to 2015. For each country, we assessed the ratio of observed levels of DALYs and HALE to those expected based on socio-demographic index (SDI), an indicator composed of measures of total fertility rate, income per capita, and average years of schooling. RESULTS: All-age numbers of DALYs reached over 19 million years in Iran in 2015. The all-age number of DALYs has remained stable during the past two decades in Iran, despite the decreasing trends in all-age and age-standardized rates. The all-cause DALY rates decreased from 47,200 in 1990 to 28,400 per 100,000 in 2015. The share of non-communicable diseases in DALYs increased in Iran (from 42% to 74%) and all of its neighbors between 1990 and 2015; the pattern of change is similar in almost all 16 countries. The DALY rates for NCDs and injuries in Iran were higher than global rates and the average rate in High Middle SDI countries, while those for communicable, maternal, neonatal, and nutritional disorders were much lower in Iran. Among men, cardiovascular diseases ranked first in all countries of the region except for Bahrain. Among women, they ranked first in 13 countries. Life expectancy and HALE show a consistent increase in all countries. Still, there are dissimilarities indicating a generally low LE and HALE in Afghanistan and Pakistan and high expectancy in Qatar, Kuwait, and Saudi Arabia. Iran ranked 11th in terms of LE at birth and 12th in terms of HALE at birth in 1990 which improved to 9th for both metrics in 2015. Turkey and Iran had the highest increase in LE and HALE from 1990 to 2015 while the lowest increase was observed in Armenia, Pakistan, Kuwait, Kazakhstan, Russia, and Iraq. CONCLUSIONS: The levels and trends in causes of DALYs, life expectancy, and HALE generally show similarities between the 16 countries, although differences exist. The differences observed between countries can be attributed to a myriad of determinants, including social, cultural, ethnic, religious, political, economic, and environmental factors as well as the performance of the health system. Investigating the differences between countries can inform more effective health policy and resource allocation. Concerted efforts at national and regional levels are required to tackle the emerging burden of non-communicable diseases and injuries in Iran and its neighbors
    • …
    corecore